Fast Logistic Regression for Data Mining, Text Classification and Link Detection
نویسندگان
چکیده
Previous work by the authors [1] demonstrated that logistic regression can be a fast and accurate data mining tool for life sciences datasets, competitive with modern tools like support vector machines and balltree based K-NN. This paper has two objectives. The first objective is a serious empirical comparison of logistic regression to several classical and modern learners on a variety of learning tasks. The second is to describe our use of conjugate gradient inside an iteratively re-weighted least squares fitting procedure.
منابع مشابه
The Probit Link Function in Generalized Linear Models for Data Mining Applications
The use of logistic regression for outcome classification of dichotomous variables is well known in data mining applications. The estimated probability of the logit transformation belongs to the class of canonical link functions that follow from particular probability distribution functions. A closely related model is the probit link which can be used for binary responses. Although the probit l...
متن کاملFinancial Reporting Fraud Detection: An Analysis of Data Mining Algorithms
In the last decade, high profile financial frauds committed by large companies in both developed and developing countries were discovered and reported. This study compares the performance of five popular statistical and machine learning models in detecting financial statement fraud. The research objects are companies which experienced both fraudulent and non-fraudulent financial statements betw...
متن کاملSimple decision forests for multi-relational classification
An important task in multi-relational data mining is link-based classification which takes advantage of attributes of links and linked entities, to predict the class label. The relational naive Bayes classifier exploits independence assumptions to achieve scalability. We introduce a weaker independence assumption to the e↵ect that information from di↵erent data tables is independent given the c...
متن کاملText Analysis and Sentiment Polarity on FIFA World Cup 2014 Tweets
Social media has become one of the most popular communication tools for sharing opinions and everyday liferelated events. Twitter as a micro-blogging service can be used to discover events and news in real time from anywhere in the world. As Twitter posts (tweets) are short and are being generated constantly, they are well-suited sources of streaming data for opinion mining and sentiment polari...
متن کاملUsing Fuzzy LR Numbers in Bayesian Text Classifier for Classifying Persian Text Documents
Text Classification is an important research field in information retrieval and text mining. The main task in text classification is to assign text documents in predefined categories based on documents’ contents and labeled-training samples. Since word detection is a difficult and time consuming task in Persian language, Bayesian text classifier is an appropriate approach to deal with different...
متن کامل